基于Solr的分布式全文检索系统的研究与实现

doi:10.3969/j.issn.1006-2475.2012.11.042

计算机与现代化 ›› 2012, Vol. 1 ›› Issue (11): 171-176.doi: 10.3969/j.issn.1006-2475.2012.11.042

基于Solr的分布式全文检索系统的研究与实现

李戴维，李宁

华北计算技术研究所信息技术与应用系统部，北京 100083

收稿日期:2012-07-13 修回日期:1900-01-01 出版日期:2012-11-10 发布日期:2012-11-10

Research and Implementation of Distributed Full-text Retrieval System Based on Solr

LI Dai-wei, LI Ning

Department of Information Technology and Application System, North China Institute of Computing Technology, Beijing 100083, China

Received:2012-07-13 Revised:1900-01-01 Online:2012-11-10 Published:2012-11-10

摘要/Abstract

摘要： 随着当前网络信息资源的急剧膨胀，传统的检索系统已经难以在处理海量数据时提供高效的、可靠的服务。针对该情况，设计并实现一个基于Solr的分布式全文检索系统。系统通过网络爬虫抓取网页信息，将抓取的信息储存为文本文件；然后利用Solr索引处理模块，在多台计算机节点上并行创建索引，有效地提高系统建立索引的速度；系统通过Zookeeper管理集群，将搜索模块设计为分布式，有效地提高检索性能；最后设计了友好的用户界面。目前，系统可以在百万数据量的环境下稳定运行，具有较强的实用价值。

关键词: 全文检索, Solr, 分布式, Zookeeper

Abstract: With the rapid growth of network information resources, traditional retrieval system has been difficult to provide efficient and reliable services to the mass data. In response to this situation, this paper designs a distributed full-text retrieval system based on Solr. The system uses a Web crawler to collect information which is stored as text files. Then the system creates indexes in parallel on multiple computers through Solr index module. It turns out that the design improves the indexing speed effectively. The system improves the retrieval performance by applying Zookeeper management and distributed design in search module. Finally a user-friendly interface is designed. Currently, the system can operate millions of data stably and has a strong practical value.

Key words: full-text search, Solr, distribution, Zookeeper

中图分类号:

TP311.133.1

李戴维;李宁. 基于Solr的分布式全文检索系统的研究与实现[J]. 计算机与现代化, 2012, 1(11): 171-176.

LI Dai-wei;LI Ning. Research and Implementation of Distributed Full-text Retrieval System Based on Solr[J]. Computer and Modernization, 2012, 1(11): 171-176.

[1]	刘洋1, 黄志2, 徐娟1, 高鹏1, 陈旭辉1. 基于分布式的省级气象数据中心硬件监控告警系统[J]. 计算机与现代化, 2024, 0(07): 41-46.
[2]	李俊晓1, 张小琳1, 石静2. 基于区块链增强的车辆边缘计算网络安全数据存储和共享[J]. 计算机与现代化, 2024, 0(07): 69-75.
[3]	李佳多, 闫秀英. 基于增强卷尾猴搜索算法的分布式电源定容选址方法[J]. 计算机与现代化, 2024, 0(04): 27-32.
[4]	余春雷1, 2, 刘笃晋1, 朱华伟1, 杨佳蓉3. 基于Petersen图的部分重复码[J]. 计算机与现代化, 2024, 0(03): 122-126.
[5]	陈超, 顾青峰. 面向混合负载的分布式气象数据管理系统设计[J]. 计算机与现代化, 2023, 0(12): 118-122.
[6]	管金平, 杨晋吉, 杨成龙. 基于概率模型的Raft协议形式化验证[J]. 计算机与现代化, 2023, 0(09): 77-81.
[7]	刘显茁, 邓韦斯, 谢恩彦. 考虑分布式发电并网的配电网自适应保护系统[J]. 计算机与现代化, 2023, 0(09): 120-126.
[8]	王哲, 王玉玫, 吴亚非, 臧义华. 基于RDMA的分布式键值存储系统性能优化[J]. 计算机与现代化, 2023, 0(02): 24-27.
[9]	张昊, 路红英. 面向边缘计算应用的拜占庭式容错分布式一致性算法[J]. 计算机与现代化, 2022, 0(12): 33-41.
[10]	邱金水, 庄会富, 金涛. 面向海量植物图像的智能检索系统设计[J]. 计算机与现代化, 2022, 0(10): 62-67.
[11]	袁嘉立, 刘梦赤. 面向信息网模型的动态数据划分算法[J]. 计算机与现代化, 2022, 0(10): 100-105.
[12]	雷鸣, 姜罕盛, 武国良, 赵玉娟, 梁健. 基于HBase的大数据架构下负载平衡技术[J]. 计算机与现代化, 2021, 0(06): 91-95.
[13]	王文蔚, 肖军弼, 程鹏, 张悦. 基于SDN的DDoS攻击防御系统[J]. 计算机与现代化, 2021, 0(02): 117-121.
[14]	雷鸣, 赵玉娟, 姜罕盛, 武国良, 梁健. 基于分布式技术的气象系统数据服务平台构建[J]. 计算机与现代化, 2020, 0(11): 56-59.
[15]	刘张榕. 基于大数据的半分布式僵尸网络动态抑制算法[J]. 计算机与现代化, 2020, 0(08): 109-113.

基于Solr的分布式全文检索系统的研究与实现

Research and Implementation of Distributed Full-text Retrieval System Based on Solr

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价